08. Scheduling in Airflow

07 Scheduling In Airflow -

Schedules in Airflow

Start Date

Airflow will begin running pipelines on the start date selected. Whenever the start date of a DAG is in the past, and the time difference between the start date and now includes more than one schedule intervals, Airflow will automatically schedule and execute a DAG run to satisfy each one of those intervals. This feature is useful in almost all enterprise settings, where companies have established years of data that may need to be retroactively analyzed.

End Date

Airflow pipelines can also have end dates. You can use an end_date with your pipeline to let Airflow know when to stop running the pipeline. End_dates can also be useful when you want to perform an overhaul or redesign of an existing pipeline. Update the old pipeline with an end_date and then have the new pipeline start on the end date of the old pipeline.

Schedules

How are schedules used by data pipelines?

SOLUTION: Determine what data should be analyzed and when

Airflow Schedules

Which of the following are used by Airflow to determine schedules?

SOLUTION:
  • start_date
  • end_date
  • schedule_interval

End Date

True or False: End date is required by Airflow Schedules.

SOLUTION: False

True or False: Start date is required by Airflow Schedules.

SOLUTION: True

True or False: Schedule interval is required by Airflow Schedules.

SOLUTION: False